Information processing device, information processing method, and program

ABSTRACT

A device and method capable of performing image following type audio control or image non-following type audio control are implemented. Images in different directions are selectively displayed on the display unit, and an output audio is controlled in accordance with an image display. A data processing unit executes image following type audio control of moving an audio source direction in accordance with movement of the display image of the display unit and image non-following type audio control of not moving the audio source direction in accordance with the movement of an image in units of individual controllable audio elements. The data processing unit acquires audio control information from an MP4 file or a media presentation description (MPD) file and executes either the image following type audio control or the image non-following type audio control in accordance with the acquired audio control information in units of individual controllable audio elements.

CROSS REFERENCE TO PRIOR APPLICATION

This application is a National Stage Patent Application of PCTInternational Patent Application No. PCT/JP2016/071111 (filed on Jul.19, 2016) under 35 U.S.C. § 371, which claims priority to JapanesePatent Application No. 2015-155740 (filed on Aug. 6, 2015), which areall hereby incorporated by reference in their entirety.

TECHNICAL FIELD

The present disclosure relates to an information processing device, aninformation processing method, and a program. More specifically, thepresent disclosure relates to an information processing device, aninformation processing method, and a program, which are capable ofcontrolling an output audio according to a display image in an imagedisplay configuration capable of observing images in various directionssuch as a celestial sphere image, an omnidirectional image, or apanorama image.

BACKGROUND ART

In recent years, imaging devices capable of capturing images in variousdirections such as a celestial sphere image, an omnidirectional image,or a panorama image have been developed, and systems in which the imagecaptured using such an imaging device is displayed on a display unit ofa PC, a tablet terminal, a mobile terminal, a head mount display (HMD),or the like, and an image selected by the user or an image automaticallyselected in accordance with a direction of the user can be observed arewidely used.

For example, in the PC or the like, it is possible to acquire video(moving image) data of an omnidirectional image of a 360° range from anexternal server or read it from a recording medium and cause it to bedisplayed on the display device. The user is able to select an image inan arbitrary direction, cause the selected image to be displayed on thedisplay device, and observe an image such as a moving image or a stillimage while changing a viewpoint freely.

The image displayed on the display unit of the PC, the tablet terminal,or the mobile terminal can be displayed in an observation directionmoved by a mouse operation of the user or a slide process, a flickprocess, or the like performed on a touch panel, and the user is able toeasily enjoy the image in various directions.

In a case where an image is displayed on the head mount display (HMD),it is possible to display an image according to a direction of the headof the user in accordance with sensor information obtained by detectinga motion or a direction of the head mounted on the HMD, and the user isable to feel as if the user were in the image displayed on the displayunit of the HMD.

Such image display devices mostly have a function of outputting an audiotogether with an image.

In most of devices of a related art which output an image and an audio,a scheme of any one of the following types (a) and (b) is employed as anaudio output control scheme:

(a) An image following type audio control scheme in which control isperformed such that an audio listening direction is moved in accordancewith movement of the observation image to follow an observation image.

(b) An image non-following type audio control scheme in which control isperformed such that an audio listening direction is fixed regardless ofmovement of an observation image.

As described above, as the audio control scheme in the device of therelated art, either (a) the image following type audio control scheme or(b) the image non-following type audio control scheme is often employed.

Further, (a) the image following type audio control scheme is disclosed,for example, in Patent Document 1 (Japanese Patent Application Laid-OpenNo. 2002-345097).

As an audio output together with an image, for example, in addition toan audio generated from a subject (object) included in the image, audioswhich are not generated by the subject in the image such as narrationsuch as explanation of an image, comments, BGM and the like areincluded.

In the case of the audio generated from the subject in the image, arealistic feeling increases when the audio listening direction is movedwith the movement of the image.

On the other hand, in the case of the audios which are not generated bythe subject in the image such as narration such as explanation of animage, comments, and BGM, it is comfortable to hear if they are heardconsistently in a fixed direction.

However, if control is performed such that an audio to follow an imageis distinguished from an audio not to follow an image, a process iscomplicated, and it is difficult to implement control.

CITATION LIST Patent Document

-   Patent Document 1: Japanese Patent Application Laid-Open No.    2002-345097

SUMMARY OF THE INVENTION Problems to be Solved by the Invention

The present disclosure is made, for example, in light of the foregoing,and it is desirable to provide an information processing device, aninformation processing method, and a program which are capable ofimplementing audio source direction control for audios to be outputtogether with an image in an image display device which outputs imagesin various directions in accordance with an operation or motion of theuser such as an omnidirectional image.

Specifically, for example, it is desirable to provide an informationprocessing device, an information processing method, and a program whichare capable of implementing display image following type control ordisplay image non-following type control in units of individualcontrollable audio elements such as audio streams, audio channels, andaudio output objects which are individually controllable.

Solutions to Problems

A first aspect of the present disclosure lies in

an information processing device, including:

a display unit that is able to selectively display images in differentdirections; and

a data processing unit that controls an audio to be output to thedisplay unit together with an image display,

in which the data processing unit executes,

in units of individual controllable audio elements,

image following type audio control of moving an audio source directionin accordance with movement of the display image of the display unit and

image non-following type audio control of not moving the audio sourcedirection in accordance with the movement of the display image of thedisplay unit.

Further, second aspect of the present disclosure lies in

a data delivery server, including:

a data processing unit that generates a file storing

image data including images in different directions which areselectively displayable,

audio data to be output together with a display image which is selectedfrom the image data and displayed, and

audio control information indicating any one of image following typeaudio control and image non-following type audio control which isexecuted in units of individual controllable audio elements,

the image following type audio control being executed such that an audiosource direction is moved in accordance with movement of the displayimage,

the image non-following type audio control being executed such that theaudio source direction is not moved in accordance with the movement ofthe display image; and

a communication unit that transmits the file generated by the dataprocessing unit.

Further, a third aspect of the present disclosure lies in

an information recording medium storing

image data including images in different directions which areselectively displayable,

audio data to be output together with a display image which is selectedfrom the image data and displayed, and

audio control information indicating any one of image following typeaudio control and image non-following type audio control which isexecuted in units of individual controllable audio elements,

the image following type audio control being executed such that an audiosource direction is moved in accordance with movement of the displayimage,

the image non-following type audio control being executed such that theaudio source direction is not moved in accordance with the movement ofthe display image,

in which a reproducing device that reproduces read data from theinformation recording medium executes any one of the image followingtype audio control and the image non-following type audio control inunits of individual controllable audio elements in accordance with theaudio control information.

Further, a fourth aspect of the present disclosure lies in

an information processing method of controlling output audio in aninformation processing device,

the information processing device including

a display unit that is able to selectively display images in differentdirections and

a data processing unit that controls an audio to be output to thedisplay unit together with an image display,

the information processing method including:

executing, by the data processing unit, in units of individualcontrollable audio elements,

image following type audio control of moving an audio source directionin accordance with movement of the display image of the display unit and

image non-following type audio control of not moving the audio sourcedirection in accordance with the movement of the display image of thedisplay unit.

Further, a fifth aspect of the present disclosure lies in

a program causing an information processing device to control an outputaudio,

the information processing device including

a display unit that is able to selectively display images in differentdirections, and

a data processing unit that controls an audio to be output to thedisplay unit together with an image display,

the program causing the data processing unit to execute:

in units of individual controllable audio elements,

image following type audio control of moving an audio source directionin accordance with movement of the display image of the display unit and

image non-following type audio control of not moving the audio sourcedirection in accordance with the movement of the display image of thedisplay unit.

Further, for example, a program of the present disclosure is a programwhich can be provided by a storage medium or a communication mediumwhich is provided to an information processing device or a computersystem capable of executing various program codes in a computer readableformat. Since the program is provided in a computer readable format, aprocess according to the program is implemented on the informationprocessing device or the computer system.

Still other objects, features, and advantages of the present disclosurewill become apparent from further detailed description based onembodiments of the present disclosure to be described later or theaccompanying drawings. Further, in this specification, a term “system”indicates a logical aggregate configuration of a plurality of devicesand not limited to a configuration in which devices of respectiveconfigurations are in the same housing.

Effects of the Invention

According to a configuration of one embodiment of the presentdisclosure, a device and a method which are capable of performing imagefollowing type audio control in which an audio source direction followsmovement of a display image of a display unit or image non-followingtype audio control in units of individual audio elements areimplemented.

Specifically, images in different directions are selectively displayedon the display unit, and an output audio is controlled in accordancewith an image display. The data processing unit executes image followingtype audio control of moving an audio source direction in accordancewith movement of the display image of the display unit and imagenon-following type audio control of not moving the audio sourcedirection in accordance with the movement of an image in units ofindividual controllable audio elements. The data processing unitacquires audio control information from an MP4 file or a mediapresentation description (MPD) file and executes either the imagefollowing type audio control or the image non-following type audiocontrol in units of audio elements in accordance with the acquired audiocontrol information.

With this configuration, a device and a method which are capable ofperforming image following type audio control in which an audio sourcedirection follows movement of a display image of a display unit or imagenon-following type audio control in units of individual audio elementsare implemented.

Further, the effect described in this specification is merely an exampleand not limited, and additional effects may be included.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a diagram for describing examples of an image display processand an audio output process in an information processing device.

FIG. 2 is a diagram for describing examples of an image display processand an audio output process in an information processing device.

FIG. 3 is a diagram for describing examples of an image display processand an audio output process in an information processing device.

FIG. 4 is a diagram for describing a data provision processconfiguration for an information processing device.

FIG. 5 is a diagram for describing an ISO base media file format.

FIG. 6 is a diagram for describing an ISO base media file format.

FIG. 7 is a diagram for describing a recording example of audio controlinformation for MP4 file.

FIG. 8 is a diagram for describing all-audio correspondence controlinformation.

FIG. 9 is a diagram for describing a setting example of a recordingorder (sequence) of audio control information.

FIG. 10 is a diagram for describing audio element correspondence controlinformation.

FIG. 11 is a diagram for describing an example of audio control.

FIG. 12 is a diagram for describing an example of a recording region ofaudio control information for an MP4 file.

FIG. 13 is a diagram for describing an example of a recording region ofaudio control information for an MP4 file.

FIG. 14 is a diagram for describing audio control information recordedfor an MP4 file.

FIG. 15 is a flowchart for describing reading of audio controlinformation from an MP4 file and an execution sequence of an audiocontrol process.

FIG. 16 is a flowchart for describing reading of audio controlinformation from an MP4 file and an execution sequence of an audiocontrol process.

FIG. 17 is a diagram for describing a data provision processconfiguration for an information processing device.

FIG. 18 is a diagram for describing an MPD file.

FIG. 19 is a diagram for describing an MPD file.

FIG. 20 is a diagram for describing audio control information recordedin an MPD file.

FIG. 21 is a diagram for describing a specific example of audio controlinformation recorded in an MPD file.

FIG. 22 is a diagram for describing a specific example of audio controlinformation recorded in an MPD file.

FIG. 23 is a diagram for describing a specific example of audio controlinformation recorded in an MPD file.

FIG. 24 is a flowchart for describing reading of audio controlinformation from an MPD file and an execution sequence of an audiocontrol process.

FIG. 25 is a flowchart for describing reading of audio controlinformation from an MPD file and an execution sequence of an audiocontrol process.

FIG. 26 is a diagram illustrating a hardware configuration example of aninformation processing device.

MODE FOR CARRYING OUT THE INVENTION

Hereinafter, an information processing device, an information processingmethod, and a program according to the present disclosure will bedescribed in detail with reference to the accompanying drawings.Further, the description will proceed in accordance with the followingitems.

1. Examples of image display control and audio output control

2. (First embodiment) embodiment in which audio control information isrecorded in MP4 file

2-1. (First audio control information recording example) recordingexample in which audio control information of channel unit is recordedin MP4 file

2-2. (Second audio control information recording example) example inwhich audio control information of stream unit is recorded in MP4 file

2-3. (Third audio control information recording example) example inwhich information indicating that audio control is settable by user isrecorded in MP4 file

3. Audio control process sequence using audio control informationrecorded in MP4 file

4. (Second embodiment) embodiment in which audio control information isrecorded in MPD

4-1. (First audio control information recording example) recordingexample in which audio control information of channel unit is recordedin MPD file

4-2. (Second audio control information recording example) example inwhich audio control information of stream unit is recorded in MPD file

4-3. (Third audio control information recording example) example inwhich information indicating that audio control is settable by user isrecorded in MPD file

5. Audio control process sequence using audio control informationrecorded in MPD file

6. Hardware configuration example of information processing device

7. Conclusion of configuration of present disclosure

1. Examples of Image Display Control and Audio Output Control

First, specific examples of image display control and audio outputcontrol in a device capable of displaying images in various directionssuch as a celestial sphere image, an omnidirectional image, or apanorama image on a display unit will be described with reference toFIG. 1 and subsequent drawings.

In recent years, imaging devices capable of capturing images in variousdirections such as a celestial sphere image, an omnidirectional image,or a panorama image have been developed, and systems in which the imagecaptured using such an imaging device is displayed on a display unit ofa PC, a tablet terminal, a mobile terminal, a head mount display (HMD),or the like, and an image in an arbitrary direction selected by the usercan be observed are widely used.

For example, it is possible to acquire video (moving image) data of anomnidirectional image of a 360° range from an external server or read itfrom a recording medium and cause it to be displayed on the displaydevice such as the PC of the user. The user is able to select an imagein an arbitrary direction from the image data obtained from the serveror the recording medium, cause the selected image to be displayed on thedisplay device, and observe a video (moving image) or a still imagewhile changing the viewpoint freely.

FIG. 1 is a diagram for describing an example in which images in variousdirections are selected and displayed on a display unit of the mobileterminal.

Image data 10 illustrated in FIG. 1 is a panorama image. An image of a360° in a horizontal direction is set as one piece of image data.

If a central part of the image data is an image of the user (observer)in the front direction (for example, 0°=the north direction), a left endof the image data 10 is an image of the user (observer) in the reardirection (−1800=a south direction), and a right end of the image data10 is an image of the user (observer) in the rear direction (+180°=thesouth direction).

The left end and the right end of the image data 10 are images at thesame position.

Further, in the case of a celestial sphere image or an omnidirectionalimage, that is, a panorama image of 360°, an image of 180° is capturedin an up and down direction, and images in all directions of up, down,right, and left are included.

In the following embodiment, an example using the panorama image of 360°in the horizontal direction will be described, but the configuration ofthe present disclosure can also be applied even in a case where acelestial sphere image or an omnidirectional image is used, and theconfiguration of the present disclosure can be applied in a devicecapable of selectively displaying images in different directions.

In the following description, the panorama image is assumed to include apanorama image of 360° in the horizontal direction, a 360° panoramaimage such as a celestial sphere image or an omnidirectional image, andall images in which images in different directions can be displayed byimage movement.

A lower part of FIG. 1 illustrates a mobile terminal 20 which is anexample of an information processing device of the user.

The display unit of the mobile terminal 20 is able to display images ofsome regions of the image data 10, for example, images of a regionarbitrarily selected by the user.

A display image A of the mobile terminal 20 on the left side is a regionimage of partial sections of image sections a1 to a2 of some regions inthe image data 10.

A display image B of the mobile terminal 20 on the right side is aregion image of partial sections of image sections b1 to b2 of someregions in the image data 10.

The user is able to move the display image through a process of slidinga finger on the display unit configured as a touch panel or the like sothat an image of an arbitrary region is displayed.

Further, the mobile terminal 20 is provided with a speaker 25, andoutputs audio data recorded together with the display image.

FIG. 2 illustrates an example in which a panorama image is displayedusing a head mount display (HMD) 30.

In a case where an image is displayed on the head mount display (HMD)30, an image corresponding to a direction of the head of the user isdisplayed in accordance with sensor information obtained by detecting amotion or a direction of the head wearing the HMD. With this imagedisplay control, the user is able to feel as if the user were in theimage displayed on the display unit of the HMD.

An image when the user wearing the HMD 30 faces left is a display imageP.

An image when the user wearing the HMD 30 faces right is a display imageQ.

The user wearing the HMD 30 is able to observe an image of a 360° rangewhile changing the direction of the body (head).

Further, the speaker 35 is also installed in the head mount display(HMD) 30 and outputs audio data recorded together with the displayimage.

Next, an audio output when the panorama image display process describedwith reference to FIGS. 1 and 2 is executed will be described withreference to FIG. 3.

As an audio output together with an image, for example, in addition toan audio generated from a subject (object) included in the image, audioswhich are not generated by the subject in the image such as narrationsuch as explanation of an image, comments, and BGM are included.

FIG. 3 illustrates an example of two types of output audios:

(First audio example) dog barking (woof) (=an audio generated from asubject (object)); and

(Second audio example) BGM and narration (which are not an audiogenerated from a subject (object))

The dog barking (woof) illustrated in FIG. 3 is an audio generated fromthe subject in the image, and the realistic feeling increases if theaudio listening direction is moved in accordance with movement of animage.

It is possible to further increase the realistic feeling when “imagefollowing type” audio control of performing a setting so that the dogbarking (woof) is heard from a “right front” in the case of the displayimage A illustrated in FIG. 3, and the dog barking (woof) is heard froma “left front” in the case of the display image B is performed.

However, in the case of audios such as BGM or narration which are notaudios generated from the subject (object), it is comfortable to hear ifthey are heard consistently in a fixed direction.

For example, it is preferable to perform “image non-following type”audio control of performing a setting so that the audios are heardconsistently in the front direction regardless of the position of thedisplay image.

A specific embodiment for implementing such audio control will bedescribed below.

2. (First Embodiment) Embodiment in which Audio Control Information isRecorded in MP4 File

First, an embodiment in which audio control information is recorded inan MP4 file will be described as a first embodiment.

FIG. 4 is a diagram illustrating an information processing device 70which executes audio control of the present disclosure according to thefirst embodiment, a server 50 which provides content including imagedata and audio data to the information processing device 70, and amedium 60.

For example, image data such as a celestial sphere image, anomnidirectional image, or a panorama image and audio data are providedfrom the server 50 illustrated in FIG. 4 to the information processingdevice 70. Alternatively, the image data and the audio data are providedfrom the medium 60 illustrated in FIG. 4 to the information processingdevice 70.

The server 50 includes, for example, a broadcasting server 51 of abroadcasting station or the like and other data providing servers 52.

The content is transmitted to the information processing device 70 via abroadcast wave or a network such as the Internet.

The information processing device 70 receives and reproduces the contenttransmitted from the server 50 via a broadcast wave or a network such asthe Internet.

Further, the medium 60 includes various media such as a disk, a flashmemory, a hard disk, and the like, which are loaded into the informationprocessing device.

The information processing device 70 reads and reproduces the contentsrecorded in the medium.

An information processing device which performs content reproduction is,for example, a TV 71, a PC 72, a mobile terminal 73, a head mountdisplay (HMD) 74, or the like and includes an image display unit and anaudio output unit (speaker).

The content provided from the server 50 or the medium 60 to theinformation processing device 70 is content including image data inwhich images in various directions can be selectively displayed such asa celestial sphere image, an omnidirectional image, or a panorama imageand audio data.

The content is stored, for example, in an MP4 file 81 and provided.

The MP4 file 81 is a file in which data is recorded in accordance withthe ISO base media file format.

The ISO base media file format is a data format which is defined byISO/IEC 14496-12 and suitable for recording data, for example, a flashmemory or the like or storage data of a transmission data file via abroadcast wave or a network.

The ISO base media file format is used, for example, when encoded datawhich is content configuration data such as an image (Video), an audio(Audio), and a subtitle (Subtitle) or metadata (attribute information)related to the data is recorded in a recording medium (medium). Further,the ISO base media file format is also used as a data storage format ofdata transmitted via a broadcast wave or a network.

In recent years, many mobile terminals have a reproducing applicationcapable of reproducing MP4 data recorded in accordance with the ISO basemedia file format, and in a case where content is recorded in a mediumof a mobile terminal, it is often requested to record the content in anMP4 format.

An overview of the ISO base media file format will be described withreference to FIGS. 5 and 6.

FIG. 5 illustrates an example of the ISO base media file formatspecified in ISO/IEC 14496-12.

An MP4 file illustrated in FIG. 5 is a file set as one processing unitin a data recording or reproduction process according to the ISO basemedia file format.

In the MP4 file, regions of box units are set, and data defined in unitsof boxes is stored in each box.

Each box has regions of a box size (box-size), a box type (box-type),and box data (box-data).

A data length (byte size) of the box is recorded in the box size(box-size).

A type of data stored in the box is recorded in the box type (box-type).

Data of a type indicated by the box type is recorded in the box data(box-data).

The following types of boxes are set in the MP4 file illustrated in FIG.5:

moov box;

trak box; and

mdat box.

Each of the above boxes is set.

Actual data which is reproduction target data such as an image, anaudio, and a subtitle is stored in the mdat box.

Further, metadata such as attribute information and reproduction controlinformation related to data stored in the mdat box is stored in the trakbox in the moov box.

The moov box is a box set as a storage region of the metadata (thereproduction control information and the attribute information) of thedata stored in the mdat box of the MP4 file.

One or more trak boxes are set in the moov box. The trak box can be setfor each data type such as an image, an audio, and a subtitle, forexample, and stores the metadata of each data.

A data storage configuration example for the MP4 file will be describedwith reference to FIG. 6. The following boxes are set in the MP4 file asdescribed above:

moov box;

trak box; and

mdat box.

Each of the above boxes is set.

For example, the following data is stored in the mdat box:

(a) image;

(b) audio; and

(c) subtitle.

The data stored in the mdat box which is a data part of the ISO basemedia file format is divided into samples serving as a basic data unit.

A set of only image samples, a set of only audio samples, a set of onlysubtitle samples, or a set of the same kind of data samples are storedin one mdat box.

The moov box is a storage region of the metadata (the reproductioncontrol information and the attribute information) of the data stored inthe mdat box of the MP4 file.

One or more trak boxes are set in the moov box. The trak box can be setfor each data type such as an image, an audio, and a subtitle, andstores the metadata of each data.

A trak (Video) box illustrated in FIG. 6 is an image correspondencemetadata storage box which stores attribute information and controlinformation related to the image data.

A trak (Audio) box is an image correspondence metadata storage box whichstores attribute information and control information related to audiodata.

A trak (Subtitle) box is an image correspondence metadata storage boxwhich stores attribute information and control information related tosubtitle data.

Further, in a case where a plurality of different pieces of image data,for example, a 2K image, a 4K image, and the like are included in thereproduced data stored in the MP4 file, it is possible to record controlinformation of an image type unit in the trak (Video) box.

Further, in a case where a plurality of different pieces of audio data,for example, a Japanese audio, an English audio, and the like areincluded in the storage audio data of the MP4 file, it is possible torecord individual control information of an audio channel unitcorresponding to the audio types into individual trak (Audio) boxes.

Further, for the BGM, the narration, the subject (object) audio, and thelike, it is also possible to record individual control information ofeach audio channel (including an audio output object) unit in the trak(Audio) box.

Further, it is also possible to set the individual trak boxes inaccordance with, for example, an audio channel corresponding to aspeaker to be output to each speaker.

For example, it is possible to record two pieces of control informationcorresponding to the output audios output from two left and rightspeakers corresponding to a stereo output in the trak (Audio) box.

Further, in the case of a 5.1 ch surround audio, the following sixspeakers are set:

a center front (Center Front) speaker;

a left front (Left Front) speaker;

a right front (Right Front) speaker;

a left surround (Left Surround) speaker;

a right surround (Right Surround) speaker; and

a low frequency effect (low frequency enhancement: LFE) speaker.

In the case of the 5.1 ch surround audio, six audio channels which areoutput audios to be output to the six speakers are recorded in the MP4file.

It is possible to record six pieces of control information correspondingto the six audio channels (audio elements) in the trak (Audio) box.

If the control information of such an audio element unit is recorded, itis possible to individually control the output audio of each speaker.

Thus, it is possible to record individual control information of each ofindividually controllable audio elements such as an audio type, an audiooutput object, and an audio channel which is distinguished by an audiooutput speaker or the like in the trak box.

It is possible to perform the individual audio control of the audioelement unit in accordance with the control information of the audioelement unit recorded in the trak box.

Next, a specific control information recording example corresponding toan audio recorded in the trak (Audio) box will be described withreference to FIG. 7.

The control information recorded in the trak (Audio) box is recorded asdata illustrated in FIG. 7.

In other words, it is the following data.

  aligned(8) class NoTrackingAudio extends FullBox(‘NTRK’){     unsignedint(8) no_tracking_flags;     if(no_tracking_flag & Some_Channel){      unsigned int(8) count; // channel       for (i=1; i<=count; i++){        unsigned int(1) NoTracking;       }       aligned(8);     }   }

“no_tracking_flags”

of the control data is

“all-audio correspondence control information 91”

as illustrated in FIG. 7.

A setting value of the “all-audio correspondence control information 91”is information indicating a general control form for all audio elementssuch as all audio channels or all audio output objects stored in the MP4file.

An example of a correspondence relation between the setting value (flagvalue) set in “no_tracking_flags” which is the “all-audio correspondencecontrol information 91” and the control form of the audio is illustratedin FIG. 8.

As illustrated in FIG. 8, correspondence between the setting value (flagvalue) and the audio control form is as follows:

setting value=0: all audios are caused to follow a display image (Allchannels can be tracked);

setting value=1: all audios are caused not to follow a display image(All channels are not tracked);

setting value=2: a display image following audio and a non-followingaudio are mixed (Some channels can be tracked); and

setting value=4: display image following audio and non-following audioare settable by the user (User selected channels can be tracked).

In a case where the setting value of the “all-audio correspondencecontrol information 91 (no_tracking_flags) is 0, control is performedsuch that all individual controllable audio elements stored in the MP4file are caused to follow the display image.

In other words, in a case where the display image moves, a process ofmoving the audio source direction to follow the movement is performed.In other words, it is “display image following type audio control.”

The “display image following type audio control” is audio sourcedirection control of the dog barking (woof) in the example describedabove with reference to FIG. 3. In other words, in the example of FIG.3, the process of moving the audio source direction of the dog barking(woof) to follow the display image has been described.

In a case where the setting value of the “all-audio correspondencecontrol information 91 (no_tracking_flags) is 0 in a configurationincluding a plurality of audio elements described above with referenceto FIG. 3, control of moving all audios such as audios such as the BGMand the narration as well as the audio of the dog barking (woof) inaccordance with movement of a display screen is performed.

In a case where the setting value of the “all-audio correspondencecontrol information 91 (no_tracking_flags) is 1, control is performedsuch that all individual controllable audio elements stored in this MP4file are caused not to follow the display image.

In other words, in a case where the display image moves, the process ofmoving the audio source direction to follow the movement is notperformed. In other words, it is “display image non-following type audiocontrol.”

The “display image non-following type audio control” is an audio sourcedirection control of the audio such as the BGM or the narration in theexample described above with reference to FIG. 3. In other words, in theexample of FIG. 3, the control of performing a setting so that theaudios such as the BGM and the narration are heard from a fixed audiosource direction, for example, consistently in the front directionwithout following the display image has been described.

In a case where the setting value of the “all-audio correspondencecontrol information 91 (no_tracking_flags) is 1 in the configurationincluding a plurality of audio elements described with reference to FIG.3, the audio control of not moving all audios such as the audio of thedog barking (woof) as well as the audios such as the BGM and thenarration in accordance with the movement of the display screen isperformed.

In a case where the setting value of the “all-audio correspondencecontrol information 91 (no_tracking_flags) is 2, it indicates that thedisplay image following audio and the display image non-following audioare mixed in all the individual controllable audio elements stored inthis MP4 file.

In this case, one of the “display image following type audio control”and the “display image non-following type audio control” which isperformed on each audio element element is decided from a loopprocessing portion 92 illustrated in FIG. 7 with reference to controlinformation corresponding to an audio element element (i), that is,“audio element (i) correspondence control information (No Tracking)”illustrated in FIG. 7.

A process of acquiring the control information corresponding to theaudio element element (i) based on recording information of the loopprocessing portion 92 will be described later.

In a case where the setting value of the “all-audio correspondencecontrol information 91 (no_tracking_flags) is 4, it indicates that theuser is able to set the display image following audio and the displayimage non-following audio for all the individual controllable audioelements stored in the MP4 file.

Next, a process of acquiring the control information corresponding tothe audio element element (i) on the basis of the recording informationof the loop processing portion 92 in a case where the setting value ofthe “all-audio correspondence control information 91 (no_tracking_flags)is 2 will be described.

In a case where the setting value of the “all-audio correspondencecontrol information 91 (no_tracking_flags) is 2, it indicates that thedisplay image following audio and the image display non-following audioare mixed in all the individual controllable audio elements stored inthe MP4 file.

In this case, one of the “display image following type audio control”and the “display image non-following type audio control” which isperformed on each audio element element is decided from the loopprocessing portion 92 illustrated in FIG. 7 with reference to thecontrol information corresponding to the audio element element (i), thatis, the “audio element (i) correspondence control information (NoTracking)” illustrated in FIG. 7.

Information indicating whether the individual controllable audio elementis an execution target of the “display image following type audiocontrol” or an execution target of “display image non-following typeaudio control” is recorded in the loop processing portion 92 for all theindividual controllable audio elements stored in the MP4 file.

The number of all audio elements is recorded in the number of channels(count) 94.

Information indicating whether control information for each element (i),that is, the audio element (i) is an execution target of the “displayimage following type audio control” or an execution target of the“display image non-following type audio control” is recorded in the loopprocessing portion 92 for audio element identifiers i=1 to count.

Further, the recording order of the audio element correspondence controlinformation in the loop processing portion 92 differs depending onstored audio data. For example, an order determined by ISO/IEC 23001-8Channel Configuration is used.

In this case, the audio element correspondence control informationassociated with the output channel of each audio output speaker issequentially recorded in the loop processing portion 92 in accordancewith a sequence specified in ISO/IEC 23001-8.

An example of the recording order of the audio element correspondencecontrol information according to the sequence recorded in ISO/IEC23001-8 will be described with reference to FIG. 9.

In the case of the MP4 file in which a stereo audio is stored, thenumber of output channels=the number of output speakers=2, and thenumber of individual controllable audio elements (the number ofchannels)=2. In this case, the number of records of the audio elementcorrespondence control information in the loop processing portion 92illustrated in FIG. 7 is 2, and count=2.

In this case, the following control information is recorded in the loopprocessing portion 92 illustrated in FIG. 7 in the described order:

a first audio element=control information of an output channel of a leftfront speaker; and

a second audio element=control information of an output channel of aright front speaker.

In other words, the “audio element (i) correspondence controlinformation (NoTracking)” indicating whether each audio element is theexecution target of the “display image following type audio control” orthe execution target of the “display image non-following type audiocontrol” is recorded in the order of the first audio element and thesecond audio element.

Further, in the case of MP4 file storing the 5.1 channel surround audio,the number of channels=the number of output speakers=6, and the numberof individual controllable audio elements (the number of channels)=6. Inthis case, the number of records of the audio element correspondencecontrol information in the loop processing portion 92 illustrated inFIG. 7 is 6, and count=6.

In this case, the following control information is recorded in the loopprocessing portion 92 illustrated in FIG. 7 in the described order:

a first audio element=control information of an output channel of acenter front (Center Front) speaker;

a second audio element=control information of an output channel of aleft front (Left Front) speaker;

a third audio element=control information of an output channel of aright front (Right Front) speaker;

a fourth audio element=control information of an output channel of aleft surround (Left Surround) speaker;

a fifth audio element=control information of an output channel of aright surround (Right Surround) speaker; and

a sixth audio element=control information of an output channel of a lowfrequency effect (LFE) speaker.

In other words, the “audio element (i) correspondence controlinformation (NoTracking)” indicating whether each audio element is theexecution target of the “display image following type audio control” orthe execution target of the “display image non-following type audiocontrol” is recorded in the order of the first to sixth audio elements.

The example described with reference to FIG. 9 is an example in whichthe controllable audio element is associated with the output channel ofeach speaker, and the audio element correspondence control informationis recorded in accordance with the sequence recorded in ISO/IEC 23001-8.

In addition to this example, the individual controllable audio elementstored in the MP4 file has various settings, and recording ordersequences corresponding to various audio elements according to thesettings are specified.

Control information corresponding to each audio element (i), that is,“audio element (i) correspondence control information (NoTracking) 93”illustrated in FIG. 7 is recorded in the loop processing portion 92 inthe specified order. In other words, information indicating whether eachaudio element element is a target of the “display image following typeaudio control” or a target of the “display image non-following typeaudio control” is recorded.

Further, it is desirable that the recording order information beseparately provided to the information processing device 70.

A specific example of the “audio element (i) correspondence controlinformation (NoTracking) 93” recorded in the loop processing portion 92will be described with reference to FIG. 10.

An example of a correspondence relation between the setting value set inthe “audio element (i) correspondence control information (NoTracking)93” and the audio control form is illustrated in FIG. 10.

As illustrated in FIG. 10, correspondence between the setting value andthe control form of audio is as follows:

a setting value=0: the audio element (i) is caused to follow the displayimage (the channel can be tracked); and

a setting value=1: the audio element (i) is caused not the followdisplay image (the channel is not tracked).

In a case where the setting value of the “audio element (i)correspondence control information (NoTracking) 93” is 0, control isperformed such that the audio element element (i) stored in the MP4 fileis caused to follow the display image.

In other words, in a case where the display image moves, a process ofmoving the audio source direction to follow the movement is performed.In other words, it is “display image following type audio control.”

In the “display image following type audio control,” similarly to theaudio source direction control of the dog barking (woof) in the exampledescribed above with reference to FIG. 3, in a case where the displayimage moves, the process of moving the audio source direction to followthe movement is performed.

In a case where the setting value of “audio element (i) correspondencecontrol information (NoTracking) 93” is 1, control is performed suchthat the audio element element (i) stored in the MP4 file is caused notto follow the display image.

In other words, in a case where the display image moves, the process ofmoving the audio source direction to follow the movement is notperformed. In other words, it is the “display image non-following typeaudio control.”

In the “display image non-following type audio control,” similarly tothe audio source direction control of the audio such as the BGM or thenarration in the example described above with reference to FIG. 3, evenwhen the display image moves, the audio source direction control ofcausing the audio not to follow the movement is performed.

The value [0] or [1] of the audio element (i) correspondence controlinformation (NoTracking) illustrated in the table illustrated in FIG. 10is stored in the loop processing portion 92 illustrated in FIG. 7 as thesetting value of each piece of audio element (i) correspondence controlinformation.

An example of control based on the setting value of each piece of audioelement (i) correspondence control information recorded in the loopprocessing portion 92 illustrated in FIG. 7 will be described withreference to FIG. 11.

FIG. 11 is a diagram illustrating a control example in the case of theMP4 file storing the 5.1 channel surround audio described above withreference to FIG. 9.

In the case of the MP4 file storing the 5.1 channel surround audio, thenumber of channels=the number of output speakers=6, and the number ofindividual controllable audio elements (the number of channels)=6. Inthis case, the number of records of the audio element correspondencecontrol information in the loop processing portion 92 illustrated inFIG. 7 is 6, and count=6.

In this case, the following control information is recorded in the loopprocessing portion 92 illustrated in FIG. 7 in the described order:

a first audio element=control information of an output channel of acenter front speaker;

a second audio element=control information of an output channel of aleft front speaker;

a third audio element=control information of an output channel of aright front speaker;

a fourth audio element=control information of an output channel of aleft surround speaker;

a fifth audio element=control information of an output channel of rightsurround speaker; and

a sixth audio element=control information of an output channel of a lowfrequency effect (LFE) speaker.

The control example illustrated in FIG. 11 is an example of control in acase where the setting value of the “audio element (i) correspondencecontrol information (NoTracking) 93” recorded in the loop processingportion 92 illustrated in FIG. 7 has the following setting:

a setting value of control information of a first audio element (theoutput channel of the center front speaker)=1;

a setting value of control information of a second audio element (theoutput channel of the left front speaker)=0;

a setting value of control information of a third audio element (theoutput channel of the right front speaker)=0;

a setting value of control information of a fourth audio element (theoutput channel of the left surround speaker)=0;

a setting value of control information of a fifth audio element (theoutput channel of the right surround speaker)=0; and

a setting value of control information of a sixth audio element (theoutput channel of the low frequency effect (LFE) speaker)=0.

The above setting values are setting values indicating that

the audio control of causing the audio not to follow the movement of thedisplay image, that is, the “display image non-following type audiocontrol” is performed only on the first audio element (the outputchannel of the center front speaker), and the audio control of causingthe audio to follow the movement of the display image, that is, the“display image following type audio control” is performed on the secondto sixth audio elements.

For example, specifically, the BGM or the narration is output from thefirst audio element (the output channel of the center front speaker),and output audios of the subject in the display image are output fromthe other speakers.

This corresponds to such a setting.

A user (observer) 101 illustrated in FIG. 11 wears a head mount display(HMD) and observes an omnidirectional image or a panorama image. Controlis performed such that an observation image moves in accordance with thedirection of the head of the user.

Further, the six speakers illustrated in FIG. 11 are virtual speakersand do not actually exist.

The speaker is installed in the HMD worn by the user 101 and isconfigured to output a pseudo 5.1 ch surround audio through headphonesof the left and right ears.

Here, six individual controllable audio elements corresponding to outputaudios of the six speakers corresponding to the 5.1 ch surround arerecorded in the MP4 file and controlled in accordance with the audioelement correspondence control information.

In (A) a user (observer) front direction setting illustrated in FIG. 11,the BGM and the narration are set to be heard from the virtual centerfront speaker (Center front) in the front.

The center front speaker (Center front) is the first audio element whichoutputs the BGM and the narration.

Other audios, for example, audios output from the subjects in theobservation image, for example, the dog barking and the like are set tobe heard from the other speakers.

The other speakers are the second to sixth audio elements which outputthe subject audios and the like.

In the example illustrated in FIG. 11 (A), the dog barking is heard fromthe left front (Left Front) speaker.

Then, if the user 101 rotates the body in (B) a user (observer) rightdirection setting illustrated in FIG. 11, an image displayed on the HMDalso moves with the rotation.

However, in the BGM or the narration, the first audio element (theoutput channel of the center front speaker) is an audio element notfollowing the display image. In other words, a direction in which theBGM or the narration is heard is the same position to the user, and arelative position relation between the audio source and the user is notchanged.

Therefore, even when the user 101 rotates the body in (B) the user(observer) right direction setting, the BGM and the narration are set tobe heard from the front of the user, that is, from the right side inFIG. 11.

Thus, a similar effect to when the first audio element (center frontspeaker) rotates with the rotation of the user is obtained.

On the other hand, for example, the second to sixth audio elementscorresponding to the outputs from the other speakers such as the dogbarking are audio elements following the display image. In other words,the direction in which the subject audio such as the dog barking (woof)is heard moves with the movement of the observation image of the user.In this case, the relative position relation between the audio sourcedirection and the user is changed.

If the user 101 rotates the body to (B) the user (observer) rightdirection setting, the dog barking is set to be heard from the left rearspeaker of the user, that is, the virtual left front (Left Front)speaker.

As described above, the information processing device 70 executescontrol of each audio element on the basis of the recorded value of theaudio element correspondence control information recorded in the loopprocessing portion illustrated in FIG. 7.

The audio control information illustrated in FIG. 7 is recorded in thetrak box which is the control information (metadata) recording regioncorresponding to the audio (Audio) of the MP4 file described above withreference to FIGS. 5 and 6.

It is possible to record various control information in the trak boxwhich is an audio control information recording region.

Two examples of the recording positions set in the trak box in which theaudio control information illustrated in FIG. 7 is recorded will bedescribed with reference to FIGS. 12 and 13.

(First Control Information Storage Example)

A first control information storage example illustrated in FIG. 12 willbe described.

The example illustrated in FIG. 12 is an example in which an audiocontrol information (NoTrackingAudio) record box is set as a lower boxin an audio sample entry (AudioSampleEntry) storing codec informationand the like in the trak box serving as an audio control informationstorage box of the MP4 file.

The control information illustrated in FIG. 7 is recorded in the audiocontrol information (NoTrackingAudio) record box illustrated in FIG. 12.

(Second Control Information Storage Example)

The second control information storage example illustrated in FIG. 13will be described.

The example illustrated in FIG. 13 is an example in which the audiocontrol information (NoTrackingAudio) record box is set as the lower boxin the user data (udta) box storing the user data in the trak boxserving as the audio control information storage box of the MP4 file.

The control information illustrated in FIG. 7 is recorded in the audiocontrol information (NoTrackingAudio) record box illustrated in FIG. 12.

It is possible to record the audio control information in the MP4 file81, for example, in each metadata recording region described withreference to FIGS. 12 and 13.

The following three recording examples will be sequentially describedbelow as the specific control information recording example for the MP4file:

(First audio control information recording example) the audio controlinformation of a channel unit is recorded in the MP4 file;

(Second audio control information recording example) the audio controlinformation of a stream unit is recorded in the MP4 file; and

(Third audio control information recording example) informationindicating that the audio control is settable by the user is recorded inthe MP4 file.

The respective recording examples will be described below.

[2-1. (First Audio Control Information Recording Example) RecordingExample in which Audio Control Information of Channel Unit Recorded inMP4 File]

The 5.1 ch surround audio described above is configured with thefollowing audio elements:

a first audio element=an output channel of a center front speaker(Center Front);

a second audio element=an output channel of a left front speaker (LeftFront);

a third audio element=an output channel of a right front speaker (RightFront);

a fourth audio element=an output channel of a left surround speaker(Left Surround);

a fifth audio element=an output channel of a right surround speaker(Right Surround); and

a sixth audio element=an output channel of a low frequency effect (LFE)speaker (LFE).

For example, in a case where the 5.1 ch surround audio is used incontent such as a current movie, the output channel of the center frontspeaker (Center Front) is often used for the narration or the like.

In a case where the output channel of the center front speaker (CenterFront) is used for narration output in a moving image configured with acelestial sphere image, an omnidirectional image, or a panorama image,it is often desirable that the output channel of the center frontspeaker (Center Front) be fixed for the narration, and the otherchannels are controlled such that audios following the display imageposition be output.

In a case where the audio control information is recorded in the MP4file, the following parameters can be recorded in the MP4 file:

(1) the all-audio correspondence control information(no_tracking_flags); and

(2) the audio element (i) correspondence control information(NoTracking).

As described above with reference to FIG. 8, the correspondence relationbetween the setting value (flag value) of “(1) the all-audiocorrespondence control information (no_tracking_flags)” and the audiocontrol form is as follows:

setting value=0: all audios are caused to follow a display image (Allchannels can be tracked);

setting value=1: all audios are caused not to follow a display image(All channels are not tracked);

setting value=2: a display image following audio and a non-followingaudio are mixed (Some channels can be tracked); and

setting value=4: display image following audio and non-following audioare settable by the user (User selected channels can be tracked).

Further, as described above with reference to FIG. 10, thecorrespondence relation between the setting value of “(2) the audioelement (i) correspondence control information (NoTracking)” and theaudio control form is as follows:

a setting value=0: the audio element (i) is caused to follow the displayimage (the channel can be tracked); and

a setting value=1: the audio element (i) is caused not the followdisplay image (the channel is not tracked).

Further, the recording order in a case where the audio element (i)correspondence control information (NoTracking) setting value isrecorded is specified in advance as described above with reference toFIG. 7.

[2-2. (Second Audio Control Information Recording Example) Example inwhich Audio Control Information of Stream Unit is Recorded in MP4 File]

Next, an example in which audio control information of a stream unit isrecorded in the MP4 file will be described as the second recordingexample of recording the audio control information in the MP4.

An audio control information recording example for the MP4 file in acase where two audio streams are recorded in the MP4 file will bedescribed as one specific example.

The following two audio streams are assumed to be recorded in the MP4file:

(1) 5.1 ch surround audio stream; and

(2) ich monaural audio stream.

In a case where the two audio streams are recorded in the MP4 file, theaudio control information corresponding to the two audio streams isrecorded in the MP4 file.

As an example, the control form has the following settings:

(1) the 5.1 ch surround audio stream is an audio stream configured withaudios and the like generated from the subjects in the image andundergoes the image follow type control; and

(2) the 1 ch monaural audio stream is an audio stream configured withnarration or the like and undergoes the image non-following type controlof outputting the audio from a fixed position regardless of the displayposition.

Further, when an audio is output, two streams of the 5.1 ch and the ichare decoded, synthesized, and output.

In the audio output process, an audio output control unit of theinformation processing device performs a process of decoding the 5.1 chsurround audio, setting the decoded the 5.1 ch surround audio as anoutput audio according to a display position, then synthesizing the 5.1ch surround audio with a decoded stream of the ich monaural audio, andoutputting a resulting audio.

[2-3. (Third Audio Control Information Recording Example) Example inwhich Information Indicating that the Audio Control is Settable by Useris Recorded in MP4 File]

Next, an example in which information indicating that the audio controlis settable by the user is recorded in the MP4 file will be described asthe third audio control information recording example for MP4 file.

In a case where a plurality of controllable audio elements are includedin the MP4 file, it is possible to provide a configuration in which thedisplay image following audio and the image display non-following audioare settable by the user in units of audio elements.

As described above with reference to FIG. 8, the correspondence relationbetween the setting value (flag value) of (1) the all-audiocorrespondence control information (no_tracking_flags) and the audiocontrol form is as follows:

setting value=0: all audios are caused to follow a display image (Allchannels can be tracked);

setting value=1: all audios are caused not to follow a display image(All channels are not tracked);

setting value=2: a display image following audio and a non-followingaudio are mixed (Some channels can be tracked); and

setting value=4: display image following audio and non-following audioare settable by the user (User selected channels can be tracked).

In a case where the setting value=4 is recorded in the MP4 file, itindicates that the user is able to set the display image following audioand the display image non-following audio for each of a plurality ofaudio elements.

For example, the following two audio streams are assumed to be recordedin the MP4 file, similarly to the second audio control informationrecording example:

(1) 5.1 ch surround audio stream; and

(2) 1 ch monaural audio stream.

In a case where the two audio streams are recorded in the MP4 file, theaudio control information corresponding to the two audio streams isrecorded in the MP4 file.

Various settings can be performed as a specific recording processingconfiguration, and one example will be described with reference to FIG.14.

For example, as illustrated in FIG. 14, first, as the audio controlinformation of the stream unit,

control information similar to the setting value (flag value) of the“all-audio correspondence control information (no_tracking_flags)”described above with reference to FIG. 8 is recorded:

setting value=0: all audios are caused to follow a display image (Allchannels can be tracked);

setting value=1: all audios are caused not to follow a display image(All channels are not tracked);

setting value=2: a display image following audio and a non-followingaudio are mixed (Some channels can be tracked); and

setting value=4: display image following audio and non-following audioare settable by the user (User selected channels can be tracked).

As an example, the control form has the following settings.

The 5.1 ch surround audio stream and the ich monaural audio stream arealso assumed to be settable by the user. In this case, the setting value(flag value) of the all-audio correspondence control information(no_tracking_flags) is set to 4 in both cases.

Since the recording process is performed, it is possible to record thecontrol information for the audio element of the stream unit.

Further, in a case where the user setting is performed, the dataprocessing unit of the information processing device performs a processof presenting a user interface (UI) for causing the user to decide thecontrol form to the display unit, and the control form of each audioelement is decided in accordance with a user input.

3. Audio Control Process Sequence Using Audio Control InformationRecorded in MP4 File

Next, an audio control process sequence executed in the informationprocessing device, that is, an audio control process sequence using theaudio control information recorded in the MP4 file will be described.

Flowcharts illustrated in FIGS. 15 and 16 are flowcharts for describingthe audio control process sequence executed in the informationprocessing device 70 serving as a user device.

The information processing device 70 includes a display unit (display)and an audio output unit (speaker).

The information processing device 70 is, for example, a TV, a PC, amobile terminal, a head mount display (HMD), or the like.

The information processing device 70 acquires the MP4 file from, forexample, the server 50 or the medium 60 illustrated in FIG. 4, andreproduces content recorded in the MP4 file.

The reproduction content is content which includes an image in whichimages in various directions can be observed such as a celestial sphereimage, an omnidirectional image, or a panorama image and furtherincludes audio information to be reproduced together with the image.

Image data and audio data are stored in the MP4 file, and the controlinformation corresponding to the image data and the audio data is alsostored in the MP4 file.

The audio control information includes the control information describedabove with reference to FIG. 7.

A process sequence executed in the information processing device 70 willbe described with reference to the flowcharts illustrated in FIGS. 15and 16.

Further, a process according to the flowcharts illustrated in FIGS. 15and 16 is executed in the information processing device 70. Theinformation processing device 70 includes a data processing unitequipped with a CPU having a program execution function, and eachprocess is executed under the control of the data processing unit.Further, a hardware configuration example of the information processingdevice 70 will be described later.

A process of steps of the flow illustrated in FIGS. 15 and 16 will bedescribed.

(Step S101)

In step S101, the data processing unit of the information processingdevice acquires the MP4 file.

(Step S102)

Then, in step S102, the data processing unit of the informationprocessing device acquires the all-audio correspondence controlinformation (no_tracking_flag) from the acquired MP4 file.

It is a process of acquiring the all-audio correspondence controlinformation (no_tracking_flag) 91 in the control information describedabove with reference to FIG. 7.

(Step S103)

Next, in step S103, the data processing unit of the informationprocessing device determines whether or not a setting of the all-audiocorrespondence control information acquired in step S102 is(no_tracking_flag=0), that is, a setting of the “display image followingtype audio control.”

In a case where the setting of the all-audio correspondence controlinformation is (no_tracking_flag=0), that is, the setting of the“display image following type audio control,” the process proceeds tostep S104.

On the other hand, in a case where the setting of the all-audiocorrespondence control information is (no_tracking_flag≠0), that is, notthe setting of the “display image following type audio control,” theprocess proceeds to step S105.

(Step S104)

In a case where it is determined in step S103 that the setting of theall-audio correspondence control information is (no_tracking_flag=0),that is, the setting of the “display image following type audiocontrol,” the data processing unit of the information processing deviceperforms a process of step S104.

In step S104, the data processing unit of the information processingdevice decides to execute the “display image following type audiocontrol” of causing all the audio elements to follow the display image.

In other words, the audio control of changing the output of each speakerin accordance with the display image position is performed.

(Step S105)

On the other hand, in a case where it is determined in step S103 thatthe setting of all-audio correspondence control information is(no_tracking_flag≠0), that is, not the setting of the “display imagefollowing type audio control,” the data processing unit of theinformation processing device performs a process of step S105.

In step S105, the data processing unit of the information processingdevice determines whether or not the setting of the all-audiocorrespondence control information acquired in step S102 is(no_tracking_flag=1), that is, the setting of the “display imagenon-following type audio control.”

In a case where the setting of the all-audio correspondence controlinformation is (no_tracking_flag=1), that is, the setting of the“display image non-following type audio control,” the process proceedsto step S106.

On the other hand, in a case where the setting of all-audiocorrespondence control information is (no_tracking_flag≠1), that is, notthe setting of the “display image non-following type audio control,” theprocess proceeds to step S201.

(Step S106)

In a case where it is determined in step S105 that the setting of theall-audio correspondence control information is (no_tracking_flag=1),that is, the setting of the “display image non-following type audiocontrol,” the data processing unit of the information processing deviceperforms a process of step S106.

In step S106, the data processing unit of the information processingdevice decides to execute the “display image non-following type audiocontrol” of causing all the audio elements not to follow the displayimage.

In other words, the audio output control having a setting so that theoutput of each speaker is not changed in accordance with the displayimage position.

(Step S201)

On the other hand, in a case where it is determined in step S105 thatthe setting of the all-audio correspondence control information is(no_tracking_flag #1), that is, not the setting of the “display imagenon-following type audio control,” the data processing unit of theinformation processing device performs a process of step S201.

In step S201, the data processing unit of the information processingdevice determines whether or not the setting of the all-audiocorrespondence control information acquired in step S102 is(no_tracking_flag=2), that is, whether or not any one of an elementserving as a target of the “display image following type audio control”and an element serving as a target of the “display image non-followingtype audio control” is included in the individual controllable audioelement included in the MP4 file.

In a case where the setting of the all-audio correspondence controlinformation is (no_tracking_flag=2), that is, the setting indicatingthat the audio element serving as the target of the “display imagefollowing type audio control” and the audio element serving as thetarget of the “display image non-following type audio control” aremixed, the process proceeds to step S202.

On the other hand, in a case where the setting of the all-audiocorrespondence control information is (no_tracking_flag≠2), that is, notthe setting indicating that the audio element serving as the target ofthe “display image following type audio control” and the audio elementserving as the target of the “display image non-following type audiocontrol” are mixed, the process proceeds to step S251.

Further, in this case, as understood from FIG. 8, it indicates that thesetting of the all-audio correspondence control information is(no_tracking_flag=4), that is, the setting in which it is settable bythe user.

(Step S251)

In a case where it is determined in step S201 that the setting of theall-audio correspondence control information is (no_tracking_flag≠2),that is, the setting of the all-audio correspondence control informationis (no_tracking_flag=4), the process proceeds to step S251.

In step S251, the data processing unit of the information processingdevice performs the audio control in accordance with the user setting.

Further, when a user setting process is performed, for example, the dataprocessing unit of the information processing device causes an operationscreen (UI) which is settable by the user to be displayed on the displayunit to urge the user to input the control form for each audio element.

The data processing unit of the information processing device decidesthe control form of each audio element in accordance with the user inputinformation and performs the audio control.

(Step S202)

In a case where it is determined in the determination process of stepS201 whether or not the setting of the all-audio correspondence controlinformation is (no_tracking_flag=2), that is, the setting indicatingthat the audio element serving as the target of the “display imagefollowing type audio control” and the audio element serving as thetarget of the “display image non-following type audio control” aremixed, the process proceeds to step S202.

The process of step S202 and subsequent steps is a process in which therecording information of the loop processing portion 92 in the controlinformation illustrated in FIG. 7 is applied.

In other words, the audio element correspondence control informationcorresponding to each audio element (i) is read, and the control formfor each audio element is decided.

First, the process of step S202 is an initial setting of the audioelement identifier (i), and i=1 is set.

(Step S203)

In step S203, the data processing unit of the information processingdevice determines whether or not a value of the audio element identifier(i) is equal to or less than the number of individual controllable audioelements (count) recorded in the processing target MP4 file.

In a case where i>count,

it indicates that the process has been completed for all the audioelements, and the process proceeds to step S271.

In a case where i≥count,

it indicates that there is an unprocessed audio element, and the processproceeds to step S204.

(Step S204)

In a case where it is determined in step S203 that the audio elementidentifier=i≤count, the process of step S204 is performed.

In step S204, the data processing unit of the information processingdevice acquires the setting value of the audio element (i)correspondence control information (NoTracking) corresponding to theaudio element identifier (i) from the loop processing portion 92 of thecontrol information illustrated in FIG. 7.

Further, it is determined whether the setting value of the acquiredaudio element (i) correspondence control information (NoTracking) is

the setting value=0, that is, it is the setting of the “display imagefollowing type audio control,” or

the setting value=1, that is, it is the setting of the “display imagenon-following type audio control.”

In a case where the setting value=0, that is, it is

the setting of the “display image following type audio control,” theprocess proceeds to step S205.

On the other hand, in a case where the setting value=1, that is, it isthe setting of the “display image non-following type audio control,” theprocess proceeds to step S206.

(Step S205)

In a case where it is determined in step S204 that the setting value ofthe audio element (i) correspondence control information (NoTracking)corresponding to the audio element (i) is

the setting value=0, that is, the setting of the “display imagefollowing type audio control,” the process proceeds to step S205.

In step S205, the data processing unit of the information processingdevice decides to execute the control of the audio element element (i)of the processing target as the “display image following type audiocontrol” of causing the audio to follow the display image.

In other words, the audio control of changing the output of each speakerin accordance with the display image position is performed.

(Step S206)

On the other hand, in a case where it is determined in step S204 thatthe setting value of the audio element (i) correspondence controlinformation (NoTracking) corresponding to the audio element (i) is

the setting value=1, that is, the setting of the “display imagenon-following type audio control,” the process proceeds to step S206.

In step S206, the data processing unit of the information processingdevice decides to execute the control of the audio element element (i)of the processing target as the “display image non-following type audiocontrol” of causing the audio not to follow the display image.

In other words, the audio output control having a setting so that theoutput of each speaker is not changed in accordance with the displayimage position is performed.

(Step S207)

After a processing form of one audio element (i) is decided in step S205or step S206, in step S207, a process of updating the audio elementidentifier (i) is performed. In other words,

i=i+1 is set, and

the process proceeds to step S203. After the processing form for all theaudio elements stored in the MP4 file is decided, No is determined inthe determination process of step S203, and the process proceeds to stepS271.

(Step S271)

In step S271, the data processing unit of the information processingdevice outputs all the audio elements stored in the MP4 file inaccordance with the decided control form.

Through the processes, the audio output control is performed in units ofaudio elements in any one of the following forms:

the “display image following type control;” and

the “display image non-following type control.”

4. (Second Embodiment) Embodiment in which Audio Control Information isRecorded in MPD

Next, an embodiment in which the audio control information is recordedin the MPD will be described as a second embodiment.

FIG. 17 is a diagram illustrating an information processing device 70which executes the audio control of the present disclosure according tothe second embodiment, a server 50 which provides content includingimage data and audio data to the information processing device 70, and amedium 60.

For example, image data such as a celestial sphere image, anomnidirectional image, or a panorama image and audio data aretransmitted from the server 50 illustrated in FIG. 4 or read from themedium 60 and provided to the information processing device 70.

The server 50 includes, for example, a broadcasting server 51 such as abroadcasting station and other data providing servers 52, and variousdata is transmitted to the information processing device 70 via abroadcast wave or a network such as the Internet.

The information processing device 70 receives and reproducestransmission data transmitted from the server 50 via a broadcast wave ora network such as the Internet.

The medium 60 includes various media such as a disk, a flash memory, ahard disk, and the like, which are loaded into the informationprocessing device.

The information processing device 70 reads and reproduces the recordingdata of the media.

An information processing device which performs content reproduction is,for example, a TV 71, a PC 72, a mobile terminal 73, a head mountdisplay (HMD) 74, or the like and includes an image display unit and anaudio output unit (speaker).

The content provided from the server 50 or the medium 60 to theinformation processing device 70 is content including image data inwhich images in various directions can be selectively displayed such asa celestial sphere image, an omnidirectional image, or a panorama imageand audio data.

The content is stored, for example, in the MP4 file 81 and provided,similarly to the first embodiment described above.

In the first embodiment described above, for example, the audio controlinformation described above with reference to FIG. 7 is recorded in thetrak box serving as the metadata storage region of the MP4 file.

In the present second embodiment, audio control information related toaudio data stored in an MP4 file 81 illustrated in FIG. 17 is stored inan MPD file 82 separate from the MP4 file 81 and provided to theinformation processing device 70.

The MPD file 82 is one manifest file constituting signaling data(metadata) specified in an MPEG-DASH standard which is a standardrelated to streaming delivery content.

The MPD file 82 is a manifest file for describing metadata which ismanagement information of a moving image or an audio file.

The present second embodiment is an embodiment in which the audiocontrol information related to the audio data stored in the MP4 file 81is recorded in the MPD file 82.

For example, various control data can be stored in the MPD file 82 inunits of periods which are time intervals obtained by subdividing areproduction period of time of certain content.

A configuration example of the MPD file will be described with referenceto FIGS. 18 and 19.

FIG. 18 is a diagram illustrating an example of an MPD format.

Information such as attributes or control information can be describedin an MPD in units of various specified ranges to be described below foreach stream of an image or audio as illustrated in FIG. 18:

(1) Period defining an interval on a time axis;

(2) AdaptationSet specifying a data type or the like of an image, anaudio, or the like;

(3) Representation specifying subdivided lower data type of an image, anaudio, or the like; and

(4) SegmentInfo serving as information recording region of a segment (AVsegment) unit of an image or an audio.

FIG. 19 is a diagram illustrating information (control information,management information, attribute information, and the like)corresponding to an AV segment recorded in the MPD which is developed ina chronological order.

A time is assumed to pass from left to right. For example, the time axiscorresponds to the reproduction period of time of AV content in theinformation processing device.

Various pieces of information corresponding to the AV segment arerecorded in the MPD. Further, for example, in a case where the MPD file82 is provided from the server 50 to the information processing device70, the MPD is transmitted as the signaling data ahead of the MP4 file81 storing the AV segment which is actual target data.

The information processing device 70 is able to analyze the MPD, acquireaccess information or codec information of the MP4 file 81 storing theAV segment which is actual reproduction target data, and prepare forreproduction of the AV segment stored in the MP4 file 81.

As described above with reference to FIG. 18, the MPD is configured torecord metadata (signaling data) such as the attribute information andthe control information related to the AV segment under the followinghierarchical settings:

(1) Period;

(2) Adaptation Set;

(3) Representation; and

(4) SegmentInfo.

FIG. 19 is a diagram illustrating the metadata recording regions whichis developed on a time axis in accordance with a data type.

FIG. 19 illustrates two periods of a period 1 (Period (1)) and a period2 (Period (2) and further illustrates three adaptation sets(AdaptationSet) under the period 1 (Period (1)):

(V11) An adaptation set V11 (Adaptation (V11)) which is an imagecorrespondence information recording region;

(A11) An adaptation set A11 (Adaptation (A11)) which is a Japanese audiocorrespondence information recording region; and

(A12) An adaptation set A12 (Adaptation (A12)) which is an English audiocorrespondence information recording region.

(V11) An adaptation set V11 (Adaptation (V11)) which is the imagecorrespondence information recording region has the following twoRepresentations as information recording regions of stream units havingdifferent attributes:

(V111) A Representation (V111) (Representation (V111)) which is a lowbit rate image correspondence information recording region; and

(V112) A Representation (V112) (Representation (V112)) which is a highbit rate image correspondence information recording region.

Similarly, (A11) the adaptation set A11 (Adaptation (A11)) which is theJapanese audio image correspondence information recording region has thefollowing Representation:

(A111) A Representation (A111) (Representation (A111)) which is aJapanese audio correspondence information recording region.

Similarly, (A12) the adaptation set A12 (Adaptation (A12)) which is theEnglish audio image correspondence information recording region has thefollowing Representation.

(A121) A Representation (A121) (Representation (A121)) which is anEnglish audio correspondence information recording region.

Further, each Representation has a configuration in which informationcan be recorded in units of segments.

For example, the information processing device (client) which selectsand reproduces a high bit rate image and a Japanese audio at a time t1selects information related to the high bit rate image and the Japaneseaudio as a reproduction target and acquires the information from theMPD.

The recording information of the MPD serving as the selection target isthe information of segment regions 201 and 202 illustrated in FIG. 19.

As described above, a receiving device selects information correspondingto data (segment) to be set as a reproduction target in the receivingdevice from the MPD transmitted from a transmitting device as signalingdata and refers only to the selected information.

As described above, a data type and segment correspondence informationof a time unit can be recorded in the MPD.

In the second embodiment to be described below, image and audio data (AVsegment) which is reproduction target data are stored in the MP4 file 81illustrated in FIG. 17, and control information related to the image andaudio data (AV segment) stored in the MP4 file 81 is stored in the MPDfile 82.

In a case where the audio control information is recorded in the MPDfile 82 illustrated in FIG. 17, information indicating various controlforms can be recorded, similarly to the MP4 file described above.

FIG. 20 illustrates a correspondence relation between the control formindicated by the audio control information recorded in the MPD file 82and the control form indicated by the setting value of “(1) theall-audio correspondence control information (no_tracking_flags)”recorded in the MP4 file.

In the MPD, a new descriptor (Descriptor) for recording the audiocontrol information is set in a role element (Role Element). Forexample, as illustrated in FIG. 20,

URI=http://foo.bar/scheme/AudioNoTracking

is set as a new descriptor for recording the audio control information.

As illustrated in FIG. 20, the audio control information which can beset in the audio control information recording region of this MPD hasthe following three types:

(a) NoTracking;

(b) Numerical value character string; and

(c) USER

Further, as illustrated in FIG. 20, the setting values of the types (a)to (c) correspond to the setting values 1, 2, and 4 of “(1) theall-audio correspondence control information (no_tracking_flags)”recorded in the MP4 file described above.

In other words, it has the following correspondence relation asillustrated in FIG. 20.

(a) NoTracking corresponds to the setting value=1 of “(1) the all-audiocorrespondence control information (no_tracking_flags)” of the MP4 fileand indicates the control process of causing all audios not to followthe display image (All channels are not tracked).

(b) Numerical value character string corresponds to the setting value=2of “(1) the all-audio correspondence control information(no_tracking_flags)” of the MP4 file and indicates that the displayimage following audio and the display image non-following audio aremixed (Some channels can be tracked).

(c) USER corresponds to the setting value=4 of “(1) the all-audiocorrespondence control information (no_tracking_flags)” of the MP4 fileand indicates that the display image following audio and the displayimage non-following audio are settable by the user (User selectedchannels can be tracked).

Hereinafter, the following three recording examples will be sequentiallydescribed as specific control information recording examples in a casewhere the audio control information is recorded in the MPD file 82illustrated in FIG. 17:

(first audio control information recording example) the audio controlinformation of the channel unit is recorded in the MPD file;

(second audio control information recording example) the audio controlinformation of the stream unit is recorded in the MPD file; and

(third audio control information recording example) the informationindicating that the audio control is settable by the user is recorded inthe MPD file.

The respective recording examples will be described below.

[4-1. (First Audio Control Information Recording Example) RecordingExample in which Audio Control Information of Channel Unit is Recordedin MPD File]

The 5.1 ch surround audio described above is configured with thefollowing audio elements:

a first audio element=an output channel of a center front speaker(Center Front);

a second audio element=an output channel of a left front speaker (LeftFront);

a third audio element=an output channel of a right front speaker (RightFront);

a fourth audio element=an output channel of a left surround speaker(Left Surround);

a fifth audio element=an output channel of a right surround speaker(Right Surround); and

a sixth audio element=an output channel of a low frequency effect (LFE)speaker (LFE).

For example, in a case where the 5.1 ch surround audio is used incontent such as a current movie, the output channel of the center frontspeaker (Center Front) is often used for the narration or the like.

In a case where the output channel of the center front speaker (CenterFront) is used for narration output in a moving image configured with acelestial sphere image, an omnidirectional image, or a panorama image,it is often desirable that the output channel of the center frontspeaker (Center Front) be fixed for the narration, and the otherchannels are controlled such that audios following the display imageposition be output.

An example of the audio control information in a case where the audiocontrol information of the channel unit is recorded in the MPD file isshown below as illustrated in FIG. 21.

  <MPD>     <Period>       <AdaptationSet mime-type=“video/mp4”>        <Representation>           <BaseURL>http;//foo.bar/video.mp4</BaseURL>         </Representation>      </AdaptationSet>   ...     <!-- Audio in which only Center Channelof 5.1ch is not tracked -->       <AdaptationSet mime-type=“audio/mp4”>        <AudioChannelConfigurationschemeUri=“urn:mpeg:dash:23003:3:audio_channel_configuration :2011”value=“6”>         <RoleschemeIdUri=“http://foo.bar/scheme/AudioNoTracking” value=“100000”>        <Representation>           <BaseURL>http;//foo.bar/audio.mp4</BaseURL>         </Representation>      </AdaptationSet>   ...     </Period>   </MPD>

As illustrated in FIG. 21, control information recording region 251 isincluded in the MPD description.

The control information recording region 251 is a region storing controlinformation in which one audio element (Center Channel) of the 5.1 chstream is set to the “display image non-following type control.”

As described above, in MPD, a new descriptor (Descriptor) for recordingthe audio control information is set in the role element (Role Element).In the above example, URI=http://foo.bar/scheme/AudioNoTracking is used.

In the example illustrated in FIG. 21, a value described in the roleelement is

“100000.”

As described above with reference to FIG. 20, this value corresponds tothe setting value=2 of “(1) the all-audio correspondence controlinformation (no_tracking_flags)” of the MP4 file and indicates that thedisplay image following audio and the display image non-audioinformation are mixed (Some channels can be tracked).

In other words, the control information recording region 251 records acontrol information setting value (100000) in which only one audioelement (Center Channel) of the 5.1 ch stream is set to the “displayimage non-following type control,” and the other audio elements are setto the “display image following type control.”

The numerical value character string indicates that the following audiocontrol processes are executed:

the first audio element [center front speaker]=image non-following;

the second audio element [left front speaker]=image following;

the third audio element [right front speaker]=image following;

the fourth audio element [left surround speaker]=image following;

the fifth audio element [right surround speaker]=image following; and

the sixth audio element [low frequency effect speaker (low frequencyenhancement)]=image following speaker.

[4-2. (Second Audio Control Information Recording Example) Example inwhich Audio Control Information of Stream Unit is Recorded in MPD File]

Next, an example in which the audio control information of the streamunit is recorded in the MPD file will be described as the secondrecording example of recording the audio control information in the MPD.

As one specific example, an example in which the audio controlinformation of the stream unit is recorded in a case where an audiostream of ich is recorded in the MP4 file is shown below as illustratedin FIG. 22.

  <MPD>     <Period>       <AdaptationSet mime-type=“video/mp4”>        <Representation>           <BaseURL>http;//foo.bar/video.mp4</BaseURL>         </Representation>      </AdaptationSet>   ...     <!-Audio in which stream of 1ch is nottracked-->       <AdaptationSet mime-type=“audio/mp4”>        <AudioChannelConfigurationschemeUri=“urn:mpeg:dash:23003:3:audio_channel_configuration :2011”value=“1”>         <RoleschemeIdUri=“http://foo.bar/scheme/AudioNoTracking” value=“NoTracking”>        <Representation>           <BaseURL>http;//foo.bar/audio1.mp4</BaseURL>         </Representation>      </AdaptationSet>   ...     </Period>   </MPD>

As illustrated in FIG. 22, a control information recording region 252 isincluded in the MPD description.

The control information recording region 252 is a recording region ofcontrol information in which one audio element of one channel stream isset to the “display image non-following type control.”

As described above, in MPD, a new descriptor (Descriptor) for recordingthe audio control information is set in the role element (Role Element).In the above example, URI=http://foo.bar/scheme/AudioNoTracking is used.

In the example illustrated in FIG. 22, a value described in the roleelement is “NoTracking.”

As described above with reference to FIG. 20, the value corresponds tothe setting value=1 of “(1) the all-audio correspondence controlinformation (no_tracking_flags)” of the MP4 file, that is, the controlinformation setting value for executing the process of causing allaudios not to follow the display image (All channels are not tracked).

[4-3. (Third Audio Control Information Recording Example) Example inwhich Information Indicating that Audio Control is Settable by User isRecorded in MPD File]

Next, an example in which the information indicating that the audiocontrol is settable by the user is recorded in the MPD file will bedescribed as the third audio control information recording example forthe MPD file.

Similarly to the first embodiment described above, in this secondembodiment, in a case where a plurality of controllable audio elementsare included, the user is able to set the display image following audioand the display image non-following audio in units of audio elements.

An example of the audio control information in a case where the audiocontrol information indicating that the display image following audioand the display image non-following audio are settable by the user inunits of audio elements is recorded in the MPD file is shown below asillustrated in FIG. 23.

  <MPD>     <Period>       <AdaptationSet mime-type=“video/mp4”>        <Representation>           <BaseURL>http;//foo.bar/video.mp4</BaseURL>         </Representation>      </AdapationSet>   ...     <!-Audio in which only Center Channel of2ch is not tracked -->       <AdaptationSet mime-type=“audio/mp4”>        <AudioChannelConfigurationschemeUri=“urn:mpeg:dash:23003:3:audio_channel_configuration :2011”value=“2”>         <RoleschemeIdUri=“http://foo.bar/scheme/AudioNoTracking” value=“USER”>        <Representation>           <BaseURL>http;//foo.bar/audio.mp4</BaseURL>         </Representation>      </AdaptationSet>   ...     </Period>   </MPD>

As illustrated in FIG. 23, a control information recording region 253 isincluded in the MPD description.

The audio control information indicating that the display imagefollowing audio and the display image non-following audio are settableby the user in units of audio elements is recorded in the controlinformation recording region 253.

As described above, in MPD, a new descriptor (Descriptor) for recordingthe audio control information is set in the role element (Role Element).In the above example, URI=http://foo.bar/scheme/AudioNoTracking is used.

In the example illustrated in FIG. 23, a value described in the roleelement is

“USER.”

As described above with reference to FIG. 20, this value corresponds tothe setting value=4 of “(1) the all-audio correspondence controlinformation (no_tracking_flags)” of the MP4 file, that is, an audiocontrol information setting value indicating that the display imagefollowing audio and the display image non-following audio are settableby the user in units of audio elements.

5. Audio Control Process Sequence Using Audio Control InformationRecorded in MPD File

Next, an audio control process sequence executed in the informationprocessing device, that is, an audio control process sequence using theaudio control information recorded in the MPD file will be described.

Flowcharts illustrated in FIGS. 24 and 25 are flowcharts for describingthe audio control process sequence executed in the informationprocessing device 70 serving as a user device.

The information processing device 70 includes a display unit (display)and an audio output unit (speaker).

The information processing device 70 is, for example, a TV, a PC, amobile terminal, a head mount display (HMD), or the like.

The information processing device 70 acquires the MPD file from, forexample, the server 50 or the medium 60 illustrated in FIG. 4, andreproduces content recorded in the MPD file.

The reproduction content is content which includes an image in whichimages in various directions can be observed such as a celestial sphereimage, an omnidirectional image, or a panorama image and furtherincludes audio information to be reproduced together with the image.

Image data and audio data are stored in the MP4 file, and the controlinformation corresponding to the image data and the audio data is alsostored in the MPD file.

A process sequence executed in the information processing device 70 willbe described with reference to the flowcharts illustrated in FIGS. 24and 25.

Further, a process according to the flowcharts illustrated in FIGS. 24and 25 is executed in the information processing device 70. Theinformation processing device 70 includes a data processing unitequipped with a CPU having a program execution function, and eachprocess is executed under the control of the data processing unit.Further, a hardware configuration example of the information processingdevice 70 will be described later.

A process of steps of the flow illustrated in FIGS. 24 and 25 will bedescribed.

(Step S301)

In step S301, the data processing unit of the information processingdevice acquires the MPD file.

(Step S302)

Then, in step S302, the data processing unit of the informationprocessing device determines whether or not there is the following roleelement, that is, the role element in which the following audio controlinformation is recorded in the acquired MPD file:

<Role schemeIdUri=http://foo.bar/scheme/AudioNoTracking>

In a case where there is a role element in which the audio controlinformation is recorded, the process proceeds to step S304, andotherwise, the process proceeds to step S303.

(Step S303)

In a case where it is determined that there is no role element in whichthe audio control information is recorded in the adaptation set of theMPD file, the data processing unit of the information processing deviceperforms a process of step S303.

In step S303, the data processing unit of the information processingdevice decides to execute the “display image following type audiocontrol” of causing all audio elements to follow the display image.

In other words, the audio control of changing the output of each speakerin accordance with the display image position is performed.

(Step S304)

On the other hand, in a case where it is determined in step S303 thatthere is a role element in which the audio control information isrecorded in the adaptation set of the MPD file, a process of step S304is performed.

In step S304, the data processing unit of the information processingdevice determines whether or not a value of the audio controlinformation recorded in the adaptation set of the MPD file acquired instep S302 is

“NoTracking.”

In a case where “NoTracking” is recorded, the process proceeds to stepS305.

Otherwise, the process proceeds to step S401.

(Step S305)

In a case where it is determined in step S304 that the value of theaudio control information recorded in the adaptation set of the MPD fileis

“NoTracking,”

the data processing unit of the information processing device performs aprocess of step S305.

In step S305, the data processing unit of the information processingdevice decides to execute the “display image non-following type audiocontrol” of causing all audio elements not to follow the display image.

In other words, the audio output control having a setting so that theoutput of each speaker is not changed in accordance with the displayimage position is performed.

(Step S401)

On the other hand, in a case where it is determined in step S304 thatthe value of the audio control information recorded in the adaptationset of the MPD file is not

“NoTracking,”

the data processing unit of the information processing device performs aprocess of step S401.

In step S401, the data processing unit of the information processingdevice determines whether or not the value of the audio controlinformation recorded in the adaptation set of the MPD file acquired instep S302 is

“USER.”

In a case where the value of the audio control information recorded inthe adaptation set of the MPD file is

“USER,”

the process proceeds to step S451.

On the other hand, in a case where the value of the audio controlinformation recorded in the adaptation set of the MPD file is not

“USER,”

the process proceeds to step S402.

(Step S451)

In a case where it is determined in step S401 that the value of audiocontrol information recorded in the adaptation set of the MPD file is“USER,” the process proceeds to step S451.

In step S451, the data processing unit of the information processingdevice executes the audio control according to the user setting.

Further, when a user setting process is performed, for example, the dataprocessing unit of the information processing device causes an operationscreen (UI) which is settable by the user to be displayed on the displayunit to urge the user to input the control form for each audio element.

The data processing unit of the information processing device decidesthe control form of each audio element in accordance with the user inputinformation and performs the audio control.

(Step S402)

In a case where it is determined in the determination process of stepS401 that the value of the audio control information recorded in theadaptation set of the MPD file is not “USER,” that is, the settingindicating that the audio element serving as the target of the “displayimage following type audio control” and the audio element serving as thetarget of the “display image non-following type audio control” aremixed, the process proceeds to step S402.

The process of step S402 and subsequent steps is a process of readingthe audio element correspondence control information corresponding toeach audio element (i) and deciding the control form for each audioelement.

The process of step S402 is a process of reading the control informationbit string corresponding to each audio element from the head.

For example, in a case where it has six audio elements of six channelsconstituting a 5.1 ch surround audio, a bit string is, for example,[100000].

(Step S403)

In step S403, it is determined whether or not there is unprocessed dataof the audio control information bit string, and in a case where thereis unprocessed data, a process of step S404 and subsequent steps basedon bit values sequentially read from the head is performed.

(Step S404)

In step S404, the data processing unit of the information processingdevice performs a process based on the bit values sequentially read fromthe control information bit string corresponding to each audio element.

Further, it is determined whether the setting value (bit value)corresponding to the acquired audio element (i) is

the setting value=0, that is, the setting of the “display imagefollowing type audio control,” or

the setting value=1, that is, the setting of the “display imagenon-following type audio control.”

In a case where the setting value (bit value) corresponding to theacquired audio element (i) is the setting value=0, that is, the settingof the “display image following type audio control,” the processproceeds to step S405.

On the other hand, in a case where the setting value (bit value)corresponding to the acquired audio element (i) is the setting value=1,that is, the setting of the “display image non-following type audiocontrol,” the process proceeds to step S406.

(Step S405)

In a case where it is determined in step S404 that the

setting value of the audio element (i) correspondence controlinformation (NoTracking) corresponding to the audio element (i) is

the setting value=0, that is, the setting of the “display imagefollowing type audio control,” the process proceeds to step S405.

In step S405, the data processing unit of the information processingdevice decides to execute the control of the audio element element (i)of the processing target as the “display image following type audiocontrol” of causing the audio to follow the display image.

In other words, the audio control of changing the output of each speakerin accordance with the display image position is performed.

If the process of step S405 is completed, the process returns to stepS403, and the process based on a setting value (bit value) correspondingto a next audio element is performed.

(Step S406)

On the other hand, in a case where it is determined in step S404 thatthe setting value of the audio element (i) correspondence controlinformation (NoTracking) corresponding to the audio element (i) is

the setting value=1, that is, the setting of the “display imagenon-following type audio control,” the process proceeds to step S406.

In step S406, the data processing unit of the information processingdevice decides to execute the control of the audio element element (i)of the processing target as the “display image non-following type audiocontrol” of causing the audio not to follow the display image.

In other words, the audio output control having a setting so that theoutput of each speaker is not changed in accordance with the displayimage position is performed.

If the process of step S406 is completed, the process returns to stepS403, and the process based on a setting value (bit value) correspondingto a next audio element is performed.

(Step S471)

In a case where it is determined in step S403 that there is nounprocessed element, the data processing unit of the informationprocessing device causes the process to proceed to step S471.

In step S471, the data processing unit of the information processingdevice outputs all the audio elements stored in the MPD file inaccordance with the decided control form.

Through the processes, the audio output control is performed in units ofaudio elements in anyone of the following forms:

the “display image following type control;” and

the “display image non-following type control.”

6. Hardware Configuration Example of Information Processing Device

Next, hardware configuration examples of the information processingdevice and the server which perform the processes according to theembodiment will be described with reference to FIG. 26.

Hardware illustrated in FIG. 26 is an example of a hardwareconfiguration of the information processing device (the user device) 70illustrated in FIGS. 4 and 17, that is, the information processingdevice (the user device) 70 which executes the image reproduction andthe audio output.

Further, hardware illustrated in FIG. 26 is an example of a hardwareconfiguration of the server 50 illustrated in FIGS. 4 and 17, that is,the server 50 which performs a process of generating a file storing theimage data, the audio data, and the audio control information andtransmitting the file to the information processing device (the userdevice) 70.

A central processing unit (CPU) 301 functions as a data processing unitthat performs various kinds of processes in accordance with a programstored in a read only memory (ROM) 302 or a storage unit 308. Forexample, the CPU 301 performs the processes according to the sequencedescribed in the above-described embodiment. The random access memory(RAM) 303 stores programs executed by the CPU 301, data, and the like.The CPU 301, the ROM 302, and the RAM 303 are connected to one anothervia a bus 304.

The CPU 301 is connected to an input/output interface 305 via the bus304, and an input unit 306 configured with various kinds of switches, akeyboard, a mouse, a microphone, or the like, a display unit 307configured with a display, a speaker, or the like, audio output units321-1 to 321-n are connected to the input/output interface 305. The CPU301 executes various kinds of processes in accordance with a commandinput from the input unit 306, and outputs processing results to, forexample, the display unit 307 and the audio output units 321-1 to 321-n.

A storage unit 308 connected to the input/output interface 305 isconfigured with, for example, a hard disk or the like, and storesprograms executed by the CPU 301 and various data. A communication unit309 functions as a transceiving unit for data communication performedvia a network such as the Internet or a local area network and atransceiving unit for broadcast waves and performs communication with anexternal device.

A drive 310 connected to the input/output interface 305 drives aremovable medium 311 such as a magnetic disk, an optical disk, amagneto-optical disk, a semiconductor memory such as a memory card, andexecutes recording or reading of data.

Further, encoding or decoding of data can be performed as a processingof the CPU 301 serving as the data processing unit, but a codec which isdedicated hardware for executing an encoding process or a decodingprocess may be provided.

7. Conclusion of Configuration of Present Disclosure

The embodiment of the present disclosure has been described above indetail with reference to the specific example. However, it would beunderstood that those skilled in the art is able to make a modificationor a substitution of the embodiment without departing from the gist ofthe present disclosure. In other words, the embodiment of the presentdisclosure discloses the present invention in an exemplary form andshould not be interpreted restrictively. In order to judge the gist ofthe present disclosure, claims set forth below should be taken intoconsideration.

Further, the technology disclosed in this specification may have thefollowing configurations.

(1) An information processing device, including:

a display unit that is able to selectively display images in differentdirections; and

a data processing unit that controls an audio to be output to thedisplay unit together with an image display,

in which the data processing unit executes,

in units of individual controllable audio elements,

image following type audio control of moving an audio source directionin accordance with movement of the display image of the display unit and

image non-following type audio control of not moving the audio sourcedirection in accordance with the movement of the display image of thedisplay unit.

(2) The information processing device according to (1),

in which the data processing unit acquires audio control informationrelated to output audio data and executes any one of the image followingtype audio control and the image non-following type audio control inunits of individual controllable audio elements in accordance with theacquired audio control information.

(3) The information processing device according to (2),

in which the audio control information includes all-audio correspondencecontrol information which is control information corresponding to allaudios serving as an output target, and

the data processing unit executes collective control of all the audiosserving as the output target in accordance with a setting value of theall-audio correspondence control information.

(4) The information processing device according to (3),

in which, in a case where the setting value of the all-audiocorrespondence control information is a setting value indicating theimage following type audio control,

the data processing unit executes the image following type audio controlof moving the audio source direction in accordance with the movement ofthe display image of the display unit for all the audios serving as theoutput target.

(5) The information processing device according to (3),

in which, in a case where the setting value of the all-audiocorrespondence control information is a setting value indicating theimage non-following type audio control,

the data processing unit executes the image non-following type audiocontrol of not moving the audio source direction in accordance with themovement of the display image of the display unit for all the audiosserving as the output target.

(6) The information processing device according to any of (3) to (5),

in which the audio control information includes audio elementcorrespondence control information which is control informationcorresponding to each audio element serving as an output target,

in a case where the setting value of the all-audio correspondencecontrol information is a setting value indicating that an audio elementof the image following type audio control target and an audio element ofthe image non-following type audio control are mixed, the dataprocessing unit further acquires the audio element correspondencecontrol information and controls each audio element serving as theoutput target in accordance with the setting value of the audio elementcorrespondence control information.

(7) The information processing device according to (6),

in which the data processing unit executes the image following typeaudio control of moving the audio source direction in accordance withthe movement of the display image of the display unit for the audioelement in which the setting value of the audio element correspondencecontrol information is a setting value indicating the image followingtype audio control.

(8) The information processing device according to (6),

in which the data processing unit executes the image non-following typeaudio control of not moving the audio source direction in accordancewith the movement of the display image of the display unit for the audioelement in which the setting value of the audio element correspondencecontrol information is a setting value indicating the imagenon-following type audio control.

(9) The information processing device according to any of (2) to (8),

in which the audio control information is stored in an MP4 file, and

the data processing unit acquires the audio control information relatedto the output audio data from the MP4 file and executes any one of theimage following type audio control and the image non-following typeaudio control in units of individual controllable audio elements inaccordance with the acquired audio control information.

(10) The information processing device according to (9),

in which the audio control information is stored in a trak box of theMP4 file, and

the data processing unit acquires the audio control information relatedto the output audio data from the trak box of the MP4 file and executesany one of the image following type audio control and the imagenon-following type audio control in units of individual controllableaudio elements in accordance with the acquired audio controlinformation.

(11) The information processing device according to any of (2) to (8),

in which the audio control information is stored in a media presentationdescription (MPD) file, and

the data processing unit acquires the audio control information relatedto the output audio data from the MPD file and executes any one of theimage following type audio control and the image non-following typeaudio control in units of individual controllable audio elements inaccordance with the acquired audio control information.

(12) The information processing device according to (11),

in which the audio control information is stored in an adaptation setrecording region of the MPD file, and

the data processing unit acquires the audio control information relatedto the output audio data from the adaptation set recording region of theMPD file and executes any one of the image following type audio controland the image non-following type audio control in units of individualcontrollable audio elements in accordance with the acquired audiocontrol information.

(13) A data delivery server, including:

a data processing unit that generates a file storing

image data including images in different directions which areselectively displayable,

audio data to be output together with a display image which is selectedfrom the image data and displayed, and

audio control information indicating any one of image following typeaudio control and image non-following type audio control which isexecuted in units of individual controllable audio elements,

the image following type audio control being executed such that an audiosource direction is moved in accordance with movement of the displayimage,

the image non-following type audio control being executed such that theaudio source direction is not moved in accordance with the movement ofthe display image; and

a communication unit that transmits the file generated by the dataprocessing unit.

(14) An information recording medium storing

image data including images in different directions which areselectively displayable,

audio data to be output together with a display image which is selectedfrom the image data and displayed, and

audio control information indicating any one of image following typeaudio control and image non-following type audio control which isexecuted in units of individual controllable audio elements,

the image following type audio control being executed such that an audiosource direction is moved in accordance with movement of the displayimage,

the image non-following type audio control being executed such that theaudio source direction is not moved in accordance with the movement ofthe display image,

in which a reproducing device that reproduces read data from theinformation recording medium executes any one of the image followingtype audio control and the image non-following type audio control inunits of individual controllable audio elements in accordance with theaudio control information.

(15) An information processing method of controlling output audio in aninformation processing device,

the information processing device including

a display unit that is able to selectively display images in differentdirections and

a data processing unit that controls an audio to be output to thedisplay unit together with an image display,

the information processing method including:

executing, by the data processing unit, in units of individualcontrollable audio elements,

image following type audio control of moving an audio source directionin accordance with movement of the display image of the display unit and

image non-following type audio control of not moving the audio sourcedirection in accordance with the movement of the display image of thedisplay unit.

(16) A program causing an information processing device to control anoutput audio,

the information processing device including

a display unit that is able to selectively display images in differentdirections, and

a data processing unit that controls an audio to be output to thedisplay unit together with an image display,

the program causing the data processing unit to execute:

in units of individual controllable audio elements,

image following type audio control of moving an audio source directionin accordance with movement of the display image of the display unit and

image non-following type audio control of not moving the audio sourcedirection in accordance with the movement of the display image of thedisplay unit.

Further, a series of processes described in this specification can beexecuted by hardware, software, or a combination of both. In a casewhere the processes are executed by software, it is possible to installa program having a process sequence recorded therein in a memory in acomputer incorporated into dedicated hardware and execute the program,or it is possible to install the program in a general-purpose computercapable of executing various kinds of processes and execute the program.For example, the program may be recorded in a recording medium inadvance. Instead of installing the program from the recording medium tothe computer, the program may be received via a network such as a localarea network (LAN), the Internet, or the like and installed in arecording medium such as an internal hard disk.

Further, various kinds of processes described in this specification maybe chronologically executed in accordance with the description or may beexecuted in parallel or individually depending on a processingcapability of a device which executes the processes or as necessary.Further, in this specification, a system refers to a logical aggregateconfiguration of a plurality of devices and is not limited to aconfiguration in which devices of respective components are disclosed ina single housing.

INDUSTRIAL APPLICABILITY

As described above, according to the configuration of one embodiment ofthe present disclosure, a device and a method which are capable ofperforming image following type audio control in which an audio sourcedirection follows movement of a display image of a display unit or imagenon-following type audio control in units of individual audio elementsare implemented.

Specifically, images in different directions are selectively displayedon the display unit, and an output audio is controlled in accordancewith an image display. The data processing unit executes image followingtype audio control of moving an audio source direction in accordancewith movement of the display image of the display unit and imagenon-following type audio control of not moving the audio sourcedirection in accordance with the movement of an image in units ofindividual controllable audio elements. The data processing unitacquires audio control information from an MP4 file or a mediapresentation description (MPD) file and executes either the imagefollowing type audio control or the image non-following type audiocontrol in accordance with the acquired audio control information inunits of individual controllable audio elements.

With this configuration, a device and a method which are capable ofperforming image following type audio control in which an audio sourcedirection follows movement of a display image of a display unit or imagenon-following type audio control in units of individual audio elementsare implemented.

REFERENCE SIGNS LIST

-   10 Image data-   20 Mobile terminal-   25 Speaker-   30 Head mount display (HMD)-   35 Speaker-   50 Server-   51 Broadcasting server-   52 Data delivery server-   60 Medium-   70 Information processing device-   71 TV-   72 PC-   73 Mobile terminal-   74 Head mount display (HMD)-   81 MP4 file-   82 MPD file-   301 CPU-   302 ROM-   303 RAM-   304 Bus-   305 Input/output interface-   306 Input unit-   307 Display unit-   308 Storage unit-   309 Communication unit-   310 Drive-   311 Removable medium-   321 Audio output unit

The invention claimed is:
 1. An information processing device,comprising: a display unit configured to selectively output a displayimage from among a plurality of images in a plurality of differentdirections; and a data processing unit configured to acquire audiocontrol information related to output audio data, and control audio tobe output to the display unit together with the display image inaccordance with the acquired audio control information, wherein the dataprocessing unit executes, for one or more individual controllable audioelements related to the display image, at least one of image followingtype audio control of moving an audio source direction in accordancewith movement of the display image of the display unit between theplurality of images in the plurality of different directions, or imagenon-following type audio control of fixing the audio source directionregardless of the movement of the display image of the display unit,wherein the data processing unit determines whether to execute the imagefollowing type audio control or the image non-following type audiocontrol in accordance with the acquired audio control information,wherein the audio control information includes all-audio correspondencecontrol information which is control information indicating an audiocontrol form corresponding to the one or more individual controllableaudio elements serving as an output target, wherein the audio controlform determines whether to execute the image following type audiocontrol or the image non-following type audio control for eachindividual controllable audio element or to execute the image followingtype audio control or the image non-following type audio controlcollectively for all of the one or more individual controllable audioelements, and wherein the display unit and the data processing unit areeach implemented via at least one processor.
 2. The informationprocessing device according to claim 1, wherein the data processing unitexecutes collective control of all the one or more individualcontrollable audio elements serving as the output target in accordancewith a setting value of the all-audio correspondence controlinformation.
 3. The information processing device according to claim 2,wherein, in a case where the setting value of the all-audiocorrespondence control information indicates the image following typeaudio control, the data processing unit executes the image followingtype audio control of moving the audio source direction in accordancewith the movement of the display image of the display unit for all theone or more individual controllable audio elements serving as the outputtarget.
 4. The information processing device according to claim 2,wherein, in a case where the setting value of the all-audiocorrespondence control information indicates the image non-followingtype audio control, the data processing unit executes the imagenon-following type audio control of fixing the audio source directionregardless of the movement of the display image of the display unit forall the one or more individual controllable audio elements serving asthe output target.
 5. The information processing device according toclaim 2, wherein the audio control information includes audio elementcorrespondence control information which is control informationcorresponding to each individual controllable audio element serving asan output target, wherein, in a case where the setting value of theall-audio correspondence control information indicates that anindividual controllable audio element of the image following type audiocontrol target and an individual controllable audio element of the imagenon-following type audio control are mixed, the data processing unitfurther acquires the audio element correspondence control informationand controls each individual controllable audio element serving as theoutput target in accordance with a corresponding setting value of theaudio element correspondence control information.
 6. The informationprocessing device according to claim 5, wherein the data processing unitexecutes the image following type audio control of moving the audiosource direction in accordance with the movement of the display image ofthe display unit for each individual controllable audio element in whichthe corresponding setting value of the audio element correspondencecontrol information indicates the image following type audio control. 7.The information processing device according to claim 5, wherein the dataprocessing unit executes the image non-following type audio control offixing the audio source direction regardless of the movement of thedisplay image of the display unit for each individual controllable audioelement in which the corresponding setting value of the audio elementcorrespondence control information indicates the image non-followingtype audio control.
 8. The information processing device according toclaim 5, wherein the data processing unit executes the image followingtype audio control of moving the audio source direction in accordancewith the movement of the display image of the display unit for eachindividual controllable audio element corresponding to one or moreobjects included in the display image.
 9. The information processingdevice according to claim 5, wherein the data processing unit executesthe image non-following type audio control of not moving the audiosource direction in accordance with the movement of the display image ofthe display unit for each individual controllable audio element thatdoes not correspond to any of the objects included in the display image.10. The information processing device according to claim 1, wherein theaudio control information is stored in an MP4 file, and wherein the dataprocessing unit acquires the audio control information related to theoutput audio data from the MP4 file and executes at least one of theimage following type audio control or the image non-following type audiocontrol for each of the one or more individual controllable audioelements in accordance with the acquired audio control information. 11.The information processing device according to claim 10, wherein theaudio control information is stored in a trak box of the MP4 file, andwherein the data processing unit acquires the audio control informationrelated to the output audio data from the trak box of the MP4 file andexecutes at least one of the image following type audio control or theimage non-following type audio control for each of the one or moreindividual controllable audio elements in accordance with the acquiredaudio control information.
 12. The information processing deviceaccording to claim 1, wherein the audio control information is stored ina media presentation description (MPD) file, and the data processingunit acquires the audio control information related to the output audiodata from the MPD file and executes at least one of the image followingtype audio control or the image non-following type audio control foreach of the one or more individual controllable audio elements inaccordance with the acquired audio control information.
 13. Theinformation processing device according to claim 12, wherein the audiocontrol information is stored in an adaptation set recording region ofthe MPD file, and wherein the data processing unit acquires the audiocontrol information related to the output audio data from the adaptationset recording region of the MPD file and executes at least one of theimage following type audio control or the image non-following type audiocontrol for each of the one or more individual controllable audioelements in accordance with the acquired audio control information. 14.A data delivery server, comprising: a data processing unit thatgenerates a file storing image data including a plurality of images in aplurality of different directions which are selectively displayable,audio data to be output together with a display image which is selectedfrom the image data and displayed, and audio control informationindicating at least one of image following type audio control or imagenon-following type audio control which is executed for one or moreindividual controllable audio elements related to the display image, theimage following type audio control being executed such that an audiosource direction is moved in accordance with movement of the displayimage between the plurality of images in the plurality of differentdirections, and the image non-following type audio control beingexecuted such that the audio source direction is fixed regardless of themovement of the display image; and a communication unit that transmitsthe file generated by the data processing unit, wherein the audiocontrol information includes all-audio correspondence controlinformation which is control information indicating an audio controlform corresponding to the one or more individual controllable audioelements serving as an output target, wherein the audio control formdetermines whether to execute the image following type audio control orthe image non-following type audio control for each individualcontrollable audio element or to execute the image following type audiocontrol or the image non-following type audio control collectively forall of the one or more individual controllable audio elements, andwherein the data processing unit and the communication unit are eachimplemented via at least one processor.
 15. A non-transitorycomputer-readable storage medium having embodied thereon a program,which when executed by a computer, causes the computer to execute amethod, the method comprising: storing image data including a pluralityof images in a plurality of different directions which are selectivelydisplayable; storing audio data to be output together with a displayimage which is selected from the image data and displayed; and storingaudio control information indicating at least one of image followingtype audio control or image non-following type audio control which isexecuted for one or more individual controllable audio elements relatedto the display image, the image following type audio control beingexecuted such that an audio source direction is moved in accordance withmovement of the display image between the plurality of images in theplurality of different directions, and the image non-following typeaudio control being executed such that the audio source direction isfixed regardless of the movement of the display image, wherein areproducing device that reproduces read data from the non-transitorycomputer-readable storage medium executes at least one of the imagefollowing type audio control or the image non-following type audiocontrol in the units of individual controllable audio elements inaccordance with the audio control information, wherein the audio controlinformation includes all-audio correspondence control information whichis control information indicating an audio control form corresponding tothe one or more individual controllable audio elements serving as anoutput target, and wherein the audio control form determines whether toexecute the image following type audio control or the imagenon-following type audio control for each individual controllable audioelement or to execute the image following type audio control or theimage non-following type audio control collectively for all of the oneor more individual controllable audio elements.
 16. An informationprocessing method of controlling output audio in an informationprocessing device, the information processing device comprising adisplay unit configured to selectively output a display image from amonga plurality of images in a plurality of different directions, and a dataprocessing unit configured to control audio to be output to the displayunit together with the display image, the information processing methodcomprising: executing, by the data processing unit, for one or moreindividual controllable audio elements related to the display image inaccordance with acquired audio control information, image following typeaudio control of moving an audio source direction in accordance withmovement of the display image of the display unit between the pluralityof images in the plurality of different directions, or imagenon-following type audio control of fixing the audio source directionregardless of the movement of the display image of the display unit,wherein the audio control information includes all-audio correspondencecontrol information which is control information indicating an audiocontrol form corresponding to the one or more individual controllableaudio elements serving as an output target, and wherein the audiocontrol form determines whether to execute the image following typeaudio control or the image non-following type audio control for eachindividual controllable audio element or to execute the image followingtype audio control or the image non-following type audio controlcollectively for all of the one or more individual controllable audioelements.
 17. A non-transitory computer-readable storage medium havingembodied thereon a program, which when executed by an informationprocessing device causes the information processing device to execute amethod, the method comprising: selectively outputting a display imagefrom among a plurality of images in a plurality of different directions;and controlling audio to be output to the display unit together with thedisplay image, wherein the program causes the information processingdevice to execute, for one or more individual controllable audioelements related to the display image in accordance with acquired audiocontrol information, image following type audio control of moving anaudio source direction in accordance with movement of the display imagebetween the plurality of images in the plurality of differentdirections, or image non-following type audio control of fixing theaudio source direction regardless of the movement of the display image,wherein the audio control information includes all-audio correspondencecontrol information which is control information indicating an audiocontrol form corresponding to the one or more individual controllableaudio elements serving as an output target, and wherein the audiocontrol form determines whether to execute the image following typeaudio control or the image non-following type audio control for eachindividual controllable audio element or to execute the image followingtype audio control or the image non-following type audio controlcollectively for all of the one or more individual controllable audioelements.